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ABSTRACT 


Anti-malarial drugs such as chloroquine, Sulphadoxine-pyrimethamine and 
mefloquine have become ineffective in the treatment of malaria due to development 
of resistance by the malaria parasite. Consequently, the rise in defiance to older drugs 
initiated an emergency and a continuing need for the invention and development of 
novel antimalarial agents to treat vulnerable and drug-resistant burdens of malaria. 
A significant problem of malaria treatment and control is drug resistance procured 
by malaria parasites. one of the majorly examined enzymes in antimalarial drug 
composition due to its prospective role in Deoxyribonucleic acid (DNA) synthesis is 
Dihydrofolate Reductase (DHFR) in Plasmodium falciparum (PfDHFR- Thymidylate 
synthase (TS); TS refers to DHFR-linked thymidylate synthase in Plasmodium 
falciparum), which prompted the depletion of dihydrofolate to tetrahydrofolate. 
Hence, the purpose of this research aims to recognize prospective hits inhibiting 
DHFR and optimize them to the highest effectiveness and harmlessness in malaria 
treatment with a design strategy approach from the Chembl database, we procured 
Cycloguanil derivatives with biological activity data (pKi). The three-dimensional 
physicochemical captions of the compounds were computed. Quantitative structure- 
activity relationship (QSAR) model was constructed and a molecular mechanism was 
deduced by docking assay. Appertaining to the analysis, eleven (11) 3D descriptors 
were found to be accountable for pharmacological result related with Cycloguanil 
derivatives while hydrogen bonds were found to be ascribed to their strong binding 
affinities. The generated QSAR model was attested and found to be strong, which can 
be used to predict the action of novel compounds to the design of new antimalarials. 


Keywords: Malaria, Lipophilic, QSAR model, DHFR, Cycloguanil 


1. INTRODUCTION 


Malaria is a parasitic disease transmitted by the bite of an infected female Anopheles 
mosquito. Usually, Malaria in humans is transmitted by five species of Plasmodium 
which are vivax, malariae, ovale, knowlesi and falciparum. Plasmodium falciparum 


happens to be the most common and causes severe and life-threatening disease [1]. P. 
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knowlesi is primarily monkey malaria parasite but causes human infection in forested regions of Southeast Asia [2]. This life-threatening 
disease happens to be a crucial health problem particularly in the developing world, killing about two million people each year. 
Antimalarials such as chloroquine, mefloquine, and Sulphadoxine-pyrimethamine have turned out to be inefficacious against the 
disease [3]. In 2001, the World Health Organization recommended treatment of malaria with combination therapy which could be 
artemisinin-based combination therapy or non-artemisinin based because of growing resistance to commonly used monotherapies such 
as chloroquine and sulfadoxine-pyrimethamine [4]. Recently, some reports from Southeast Asia and certain African countries revealed 
that there was reduced sensitivity and possible resistance of P. falciparum to artemisinin-based combination therapy [5-7]. Similar 
findings was also reported by TRAC study: tracking resistance to artemisinin collaboration; a multicenter study [8]. Therefore, for the 
treatment of Drug-susceptible and drug-resistant strains of malaria, the rise in resistance to earlier drugs have produced an emergency 
and continuing need for the invention and development of new antimalarial drugs. Cycloguanil is a dihydrofolate reductase inhibitor 
[9], and also a metabolite of the antimalarial drug proguanil (Figure 1). Its formation in vivo is thought to be mainly accountable for the 
antimalarial action of proguanil [10]. Although Cycloguanil is not presently in widespread use as an antimalarial drug, the ongoing 
development of resistance to current antimalarial drugs has led to renewed interest in studying the use of Cycloguanil in combination 
with other drugs [11]. 
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Figure 1: Structure of Cycloguanil 


A paramount problem in the treatment and control of malaria is drug resistance acquired by malaria parasites. Dihydrofolate 
reductase (DHFR) in Pf (PfDHFR-TS; TS represent thymidylate DHFR-related synthase in Pf), which catalyzes the reduction of 
dihydrofolate to tetrahydrofolate is one of the majorly examined enzymes in the design of antimalarial drugs due to its possible role in 
DNA synthesis [12,13]. DHFR is also regarded as a competent target for other protozoan diseases such as leishmaniasis, 
trypanosomiasis and Chagas diseases [14]. Resistance of the parasite to antifolate is majorly caused by mutations at the functional site 
of the enzyme [15-17]. Certain number of the resistance-causing mutations comprise of a single S108N, double C59R+S108N, triple 
N511+C59R+S108N, C59R+S108N+1164L and quadruple N511+C59R+S108N+I164L. A double mutation, A16V+S108T, is distinct for 
resistance to the commercialized drug cycloguanil. 

In recent years, several computational techniques have been implemented to effectively comprehend PfDHFR-TS-ligand 
interactions [18]. Researchers have developed homologous models to investigate the resistance mechanism of frequently used antifolate 
drugs prior to the release of the PPDHFR-TS crystal structures, [19]. In a previous study, molecular modeling studies were performed, 
as well as docking studies using GOLD program, to consider the chemical association between pyrimethamine derivatives and DHFR 
(PDBID: 1j3k structure). This was evaluated based on several scoring functions, including those of GOLD, Molegro virtual docker, 
Discovery Studio and MOE, and find the Molegro virtual docker protein-ligand interaction score to show good correlation with 
binding affinity data reported [20]. The same authors carried out binding studies of pyrimethamine derivatives in both wild-type and 
mutant enzymes [21]. The same authors also performed Structure-activity relationship (QSAR) studies using neural networks on this 
class of compounds [22]. QSAR studies for certain of the antifolates have been published [23]. Using Catalyst software on a varied set of 
PfDHFR-TS inhibitors, a 3D pharmacophore model was prepared, together with cycloguanil and pyrimethamine derivatives [24]. The 
Autodock program was also introduced to prognosticate the correct binding modes of folic acid competitive DHFR antagonists 
recognized using their cell-based high-throughput screening [25]. 

In summary, recent modeling studies concentrated on studying the interactions of different small molecules antagonists of DHFR 
with wild-type and mutant active-site residues using specific docking programs and 2D-QSAR studies. Nevertheless, new research is 
needed on the significance of 3D physicochemical properties with binding affinity data and on mechanistic insight based on protein- 
ligand interaction to examine molecular association precisely for this very fascinating and therapeutically salient class of compounds. 
Thus, our report includes two main objectives: (1) To quantitatively gain new knowledge about the relationship between structure- 
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activity (QSAR) based on 3D descriptors between wild-type and mutant-bound cycloguanil derivative type of PEDHFR-TS and (2) to 
understand the interaction mechanisms between PfDHFR-TS and cycloguanil derivatives. 


2. MATERIALS AND METHODS 


2.1. Collection of data and dataset groundwork 

The trivial name, structure, origin and biological activities (Ki) of PEDHFR inhibitors were retrieved from ChEMBL database. A total of 
62 PfDHFR inhibitors were recovered based on available chemical structures with correlated bioactivities (Ki) (Figure 2). Bioactivities 
(Ki) were then modified to pKi by adopting the expression pKi = (-log (Ki X)). The chemical structures of PfDHFR inhibitors were 
obtained from the ChEMBL database as smiles format, which was then changed to (3D) SDF format on DataWarrior v5.0.0. [26]. 
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Figure 2: Raw data with corresponding bioactivities (pKi) 
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2.2. 3D-QSAR study 
The effectiveness of a compound must be quantified with molecular descriptors in order to construct 3D- QSAR model [27] and 
hence, based on it, the online tool Chemopy [28] was adopted for calculation of different descriptors in the following groups: hybrid 
features, constitutional properties, electronic properties, topological properties, and geometric descriptions. The calculated 
properties were organized in matrix form. These calculated features were preprocessed to determine the correlation coefficient 
output of 0.99 based on the variance output 0.0001 and unchanged (fixed column) using JFrameVWSP version 1.0. The dataset of 62 
molecules recovered from the literature [29] was splited for testing and training dataset by using the Kennard-Stone method [30]. 
QSAR model validation is necessary to assess how well founded a developed model is [31]. This is usually attained by assessing 
the internal firmness and prognostic potential of the QSAR models. In this study, the developed QSAR model was validated using 
the Leave-one-out (LOO) method to get the internal validation. This was done by removing one molecule, creating and validation 
of the model against a single molecule for all Q? (rCV?) values and documented. Equation (1) was used to calculate the rCV? (cross- 
regression coefficient) to elucidate the internal firmness of the model. 


rCVv? = aon 2¥,3), 


¥(¥.-¥) 


Where Y in the indicated equation represents the mean activity value of the training set. Yprea and Yobs represents the 


(1) 


corresponding predicted and observed activity values. It is notable that an rCV greater than 0.5 indicates a realistically robust 
model [32]. 

After the internal validation process, the high predictive power of the QSAR model was screened by an external test set of 
compounds that were not applied to the QSAR model building. The predictive power or external validation achieved was 
determined based on the predictive R? (Rprea?) in equation (2). 


——7 
> (i 7 Y raining) ) 


Yitest) and Yprea(test) is the observed activity and predicted values for test set compounds respectively. Y (training) is the average bioactivity 


2 — 
r prod =1- 


(2) 


of compound in the training set. 
QSAR model (Rprea?) greater than 0.6 is the acceptable predictive power for the test set molecules [27-29]. S/MLR method was 
used to develop QSAR model from the dataset to exam potential leads against PfDHFR inside a training dataset (48 compounds). 


Figure 3: Grid box within which the Cycloguanil and derivatives binds 
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2.4 Molecular docking 

The 3D crystal structure of (PDB: 3um6) from the Protein Database was obtained in Preparing the PfDHFR target. Discovery Studio 
2017R2 was used to remove all heteroatoms while the Pymol tools for non-essential water molecules. Adjacent to receptor and 
ligands preparation, this study used the PyRx, AutoDock Vina option hinged on function scoring to carry out molecular docking 
analysis. PyRx, AutoDock Vina comprehensive docking mode for searching was used for analysis. After a successful minimization 
procedure, the analysis of the grid frame centered at 35.958 x 11.4079 x 38.1239 in the x, y, and z axes as appropriate for the grid 
Dimension 25x25x25 A to indicate the binding site (Figure 3). 


3. RESULTS AND DISCUSSION 


To analyze the multiplicity of the testing and training set, the Principal Component Analysis approach (PCA) was adopted and the 
PCA was executed with structural descriptors evaluated for the entire data set. This approach helped to identify homogeneities in 
the total data, as well as to describe the spatial location of the samples to help in dividing the data into train and test sets. 

Three principal components were revealed by the PCA results (PC1, PC2 and PC3), which accounted for 73.26% of all variables 
are as follows: PC1 = 37.809%, PC2 = 23.361% and PC3 = 12.09%. Since the first three principal components can explain most of the 
variability, the different score plot is a reliable example of the spatial distribution of scores for full evidence. The distribution of 
compounds in the initial space of the three principal components is as shown by the plot of PC1, PC2 and PC3 (Figure 4) that PC1 
and PC2 cover the largest variation in overall data (Figure 5). This number revealed that the test samples and the training sets are 
uniformly distributed in 3D space, and consequently, the data set can be split. Additionally, the compounds in the training sets are 


represented across the entire data set. 


PC2 (23.361 % 


PC3 (12.09%) a | < @ Ss SS PC1 (37.809%) 
09% oF ~ =e a oe - 15 
0 S- oe 0.5 
1 S05 
2 oJ 1 
3 [ 2 4:5 
pki 4.5 5 5.5 6 6.5 6 7.5 8 8.5 
Dataset @ @rraining 


Figure 4: Analysis of the principal component of the test and train sets 
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Figure 5: Analysis of principle component with PC1 and PC2 


The next step after analysis was to split the data set into test and training set to identify and select the most important primary 
factors for DHFR inhibitory activity of the 62 cycloguanil derivatives. Stepwise-MLR (multiple linear regression) was applied as a 
variable selection method to select only the most important (relevant) combinations with them to obtain the model with maximum 
predictive power using a training data set. The eleven (11) most important descriptors in accordance with the S-MLR approach are 
Ple, MoRSEM6, Plm, MoRSEM30, RDFC6, RDFC5, RDFM11, RDFP7, MoRSEN20, MoRSEU2 and MoRSEM3 based on variance 
threshold of 0.01 and cross-correlation cut-off of 0.9. In this study, these Descriptors have been shown to be related to the biological 
response under study and are described in Figure 6. 


mean(Contribution) 


MoRSEM3 MoRSEM30 MoRSEM6 MoRSEN20 MoRSEU2 Ple Pim RDFC5 RDFC6 RDFM11 RDFP7 


Descriptors 
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Figure 6: selected 3D-descriptors with their corresponding effects 
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Multiple Stepwise Linear Regression (S-MLR) analysis, using DTC Lab [36]. A stepwisefit function was applied and following 
the rules of significance statistical analysis, was applied to obtain the QSAR equation of interest. The addition or removal of a 
potential term to the test model was based on the p-value of the F statistic (Table 1). An Input tolerance of 0.05 and output tolerance 
of 0.10 was applied. Using leave-one-out cross-validation, the prognostic performance was assessed. 


The QSAR equation describing pKi in terms of the selected 3D-descriptors is as follows: 


pKi = -0.30975 + 1.19563 Ple - 0.03316 MoRSEM6 + 0.22044 Plm + 0.12332 MoRSEM30 - 0.1015 RDFC6 + 0.12174 RDFC5 -0.00327 


RDFM11 + 0.00194 RDFP7 - 0.16778 MoRSEN20 + 0.00564 MoRSEU2- 0.00729 MoRSEM3 


The above regression equation is statistically characterized by the square of the multiple correlation coefficient R* = 0.9984, 
MSE=0.00444, leave-one-out cross-validation Q? = 0.99681, See: 0.00553, SDEP (LOO):0.00666 and standard deviation in absolute 


errors (SD, 95% data):0.00294. The predictive power of the model is illustrated in Figure 7. 


Table 1: S-MLR R2 based on each descriptor 


Parameter t-value p-value F-value Stepwise-MLR R? 
Ple 38.88588 <0.0001 1512.11139 0.93649 
MoRSEM6 -11.5027 <0.0001 132.312 0.98883 
Pim 8.06027 <0.0001 64.96792 0.99245 
MoRSEM30 6.55022 <0.0001 42.90545 0.99453 
RDFC5 5.66283 <0.0001 32.0676 0.99537 
RDFC6 -6.36169 <0.0001 40.47111 0.99636 
RDFM11 -4.93637 0.00002 24.3678 0.99703 
RDFP7 2.30092 0.02806 5.29423 0.99753 
MoRSEN20 -3.35403 0.00206 11.24953 0.99784 
MoRSEU2 3.00033 0.00519 9.00199 0.99815 
MoRSEM3 -2.26043 0.03073 5.10953 0.9984 

Adjusted R? = 0.99786 
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Figure 7: The predicted pKi values using MLR modeling against the experimental (observed) pIKi values 
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The built QSAR model, is statistically correct and easy to use for evaluation of pKi for any structurally defined compound 
sharing the same scaffold. Therefore, it can be applied to calculate the expected DHFR inhibitor potency of specific designed 
structure and decide whether or not it is worth of further considerations in the search for potential drugs. 

The eleven descriptors for the finest model (S-MLR) used to derive the PC1-PC2 loading plots are shown in Figure 8. In terms of 
loads, it was confirmed that the compounds having a higher biological activity value on the right side show a greater effect on 
MoRSEM3, MoRSEN20, Descriptors MoRSEU2, Plm, RDFC6 and Ple are on the same side as in Figure 8. On the contrary, the 
compounds with lower biological activity values on the left side are of particular importance from RDFM11, RDFC5, RDFP7, 
MoRSEM6, and MORSEM30. 
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Figure 9: R* train and Q*LOO values following numerous Y-randomization tests for S-MLR 
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The Y randomization test was performed to ensure that no random correlations were used in this study to validate the known 
QSAR model and also to ensure that the selected descriptors are not accidental. Therefore, the model results should be of low 
statistical quality. The generated MLR models were randomized by permuting the dependent variable while preserving the 
individual variables. The newly built QSAR models will be expected to have significantly low values of R? and Q? for more than a 
few experiments, thus confirming that the built QSAR models are robust. Approximately one hundred Y-randomization trials were 
performed. This study yielded lower values for R? and Q’, thus validating the original model (the GA-MLR model was established) 
(Figure 9). 

Residuals of predicted pKi values for training and testing versus experimental pKi values are plotted as shown in Figure 10. It 
was noticed that the model did not show relative and systematic error in general, since the spread of the residuals on the horizontal 
lines is random. 


ad © 
O 
1.5 @ @ oO 
” O 
1 
@ Oo 
o 
2 05|@ Oo oO ® 
Ww 
v e e 
e@ O 
© e °@ 
@) @ 
e Oo . 
©) Ce e@ 
5 6 7 8 9 
Experimental pKi (nM) 
Dataset @ Test @ Training 


Figure 10: The residual against the experimental pKi by adopting S-MLR 


Molecular docking was used in this study in order to better understand the molecular mechanism fundamental to the action of 
cycloguanil derivatives against DHFR. In the present study, the 62 cycloguanil derivatives and a co-crystallizing ligand for DHFR 
(PDBID:1CY), a molecule docked in the binding pocket of DHFR for its inhibitor (antagonist) activities. Six compounds manifested 
better binding affinity than the standard ligand (co - crystallized ligand), where 149732 and 149235 have the highest binding affinity 
of -7.2 kcal/mol, compared to the standard molecule (PDBID; -6.0 kcal/mol) and are therefore considered as the hit compounds 
(Table 2). 


Table 2: Binding affinities of cycloguanil derivatives with the highest docking score 


Ligand (CHEMBL ID) Binding Affinity (kcal/mol) 
149732 -7.2 
149235 -7.2 
149759 -6.9 
150373 -6.8 
149866 -6.8 
149843 -6.7 
Standard -6.0 
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The highest binding energy of -7.2 kcal/mol attributed to 149732 is believed to result from chemical interactions in the receptor 
active site (Figure 11a) comprising: five (5) hydrogens bonds involving LYS359 residues, TYR322, and ASP212. Two (2) 
hydrophobic interactions involving PR0O324 and LYS359 residues (Fig. 11b and c, Table 3). 


ASP212 


Interactions 
FEB Conventional Hydrogen Bond i ae 
TERE Untevorabie Donor-Donor 


Figure 11: (a) DHFR binding pocket within which the cycloguanil derivatives bind (b) 2D- and (c) 3D-interaction between 149732 
and DHFR 


Table 3: Chemical interaction table between the atoms of 149732 and DHFR binding residues 


Name XYZ:X XYZ: Y XYZ: Z Bond Distance Category 

N:LIGAND:H - A:TYR322:0 31.8015 6.1775 36.1515 2.12763 Hydrogen Bond 
N:LIGAND:H - A:ASP212:0D1 30.2745 8.398 36.9865 1.93859 Hydrogen Bond 
N:LIGAND:H - A:LYS359:0 29.0625 11.241 31.525 2.98257 Hydrogen Bond 
N:LIGAND:HN - A:ASP212:0D1 29.705 9.152 36.406 2.96375 Hydrogen Bond 
N:LIGAND:HN - A:LYS359:0 29.314 10.2865 30.9465 2.90736 Hydrogen Bond 

N:LIGAND - A:PRO324 31.2421 9.15758 34.6303 5.16227 Hydrophobic 

N:LIGAND - A:LYS359 28.2071 8.591 28.2365 4.54188 Hydrophobic 


By accelerating molecular interactions, hydrogen (H) bonds enhance a variety of cellular functions. In other words, hydrogen 
bonds are thought to aid protein-ligand binding [37, 38]. Previous studies have revealed that receptor-ligand H binding pairs are 
interdependent, high affinity binding, corresponding to an increase in binding affinity [39]. 


4. CONCLUSION 

We were able to produce a reliable QSAR model that described the biological activity markers in vitro of cycloguanil derivatives 
from a physicochemical calculation. Observed and the predicted data distribution was taken into account by the widely generated 
relationship range of numeric data. Therefore, it appears that a reliable estimate of biological activity can be made for each 
cycloguanil derivative which is structurally defined to rationally guide the search for the best possible DHFR inhibitor drugs. 
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